A deep learning model for predicting COVID-19 ARDS in critically ill patients

Background The coronavirus disease 2019 (COVID-19) is an acute infectious pneumonia caused by a severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection previously unknown to humans. However, predictive studies of acute respiratory distress syndrome (ARDS) in patients with COVID-19 are limited. In this study, we attempted to establish predictive models to predict ARDS caused by COVID-19 via a thorough analysis of patients' clinical data and CT images. Method The data of included patients were retrospectively collected from the intensive care unit in our hospital from April 2022 to June 2022. The primary outcome was the development of ARDS after ICU admission. We first established two individual predictive models based on extreme gradient boosting (XGBoost) and convolutional neural network (CNN), respectively; then, an integrated model was developed by combining the two individual models. The performance of all the predictive models was evaluated using the area under receiver operating characteristic curve (AUC), confusion matrix, and calibration plot. Results A total of 103 critically ill COVID-19 patients were included in this research, of which 23 patients (22.3%) developed ARDS after admission; five predictive variables were selected and further used to establish the machine learning models, and the XGBoost model yielded the most accurate predictions with the highest AUC (0.94, 95% CI: 0.91–0.96). The AUC of the CT-based convolutional neural network predictive model and the integrated model was 0.96 (95% CI: 0.93-0.98) and 0.97 (95% CI: 0.95–0.99), respectively. Conclusion An integrated deep learning model could be used to predict COVID-19 ARDS in critically ill patients.


Introduction
The coronavirus disease 2019 (COVID- 19) is an acute infectious pneumonia caused by a severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection (1). Evidence has shown that 33% of COVID-19 patients are at high risk of progressing into severe cases, which are accompanied by increasing mortality and morbidity (2, 3). Moreover, severe SARS-CoV-2 infection may directly lead to acute respiratory distress syndrome (ARDS), and the manifestations could be viewed as a combination of pneumonia and ARDS (4).
Although significant advances have been made in understanding and managing ARDS, the morbidity and mortality of patients diagnosed with ARDS still remain high (5).
Unfortunately, the benefits of different therapies for established ARDS are limited (6)(7)(8). Since then, the paradigm for the management of ARDS has been shifted from treatment to prevention. Identification of patients at high risk of ARDS is important for clinicians to implement effective, preventive therapies to reduce the burden of ARDS. It is reported that the median time from the onset of COVID-19 symptoms to intubation is 8.5 days when COVID-19 ARDS occurs (9). There have been several studies focusing on the early prediction of ARDS, which described well-known risk factors associated with ARDS (10)(11)(12). However, COVID-19 ARDS is a serious complication of COVID-19, which has different clinical features from pre-COVID-19 ARDS (13). Hence, a clinical tool tailored for predicting COVID-19 ARDS is urgently needed. In recent years, artificial intelligence (AI) has emerged as a promising tool in the medical field. The remarkable advantage of artificial intelligence in handling massive data could help with disease diagnostics and prognostics, radiographic recognition, and personalized treatment, etc. (14). During the COVID-19 pandemic, first-hand CT and clinical datasets helped clinicians make decisions and better understand the viral infection. For example, elevated levels of inflammatory cytokines and a reduction of T-cell subsets are closely related to COVID-19 pneumonia (15). The radiology features of COVID-19 pneumonia include a peripheral distribution of opacification, frosted glass opacities, and vascular thickening and enlargement (16). In spite of the distinct features observed in COVID-19 patients, the clinician may find it hard to figure out the underlying correlations between the clinical features and the features of CT slices, hindering the comprehensive understanding of the disease. Here, we aimed to provide a method pooling all the patients' features including CT and clinical features for improving the precision of the prediction of COVID-19 ARDS.

Methods
This is a retrospective study approved by the institutional Ethics Committees at Shanghai Renji Hospital, and informed patient consent was waived.

Study patients
All patients admitted to the intensive care unit in Shanghai Renji Hospital between April 2022 and June 2022 were screened for eligibility. Inclusion criteria were as follows: (1) patients who were 18 years old and above; and (2) patients who met the diagnosis of COVID-19 ARDS. Exclusion criteria were as follows: (1) patients who were diagnosed with ARDS within the first day of admission; (2) missing clinical data were more than 20%; and (3) without any CT scan results.

Diagnosis of COVID-ARDS
SARS-CoV-2 infection can be identified by the detection of viral RNA in nasopharyngeal secretions via PCR test. The diagnosis of COVID-19 was confirmed by the patients' clinical history, epidemiological contact, and a positive SARS-CoV-2 test.

Data collection
We collected the first sets of chest CT images and clinical data after the patients' admission to the intensive care unit. The clinical data included demographic information, comorbidity conditions, respiratory support methods, onset symptoms, vital signs at admission, aeration variables, routine blood tests, inflammation tests, biochemical tests, blood coagulation tests, lymphocyte subset tests, and cytokine profile tests. Original CT images both in JPG and DICOM format of the included patients were collected. In this study, we randomly divided the patients into training and validation cohorts in a ratio of 7:3.

Statistical analysis
The categorical variables were presented as counts and corresponding proportions and were further compared using the chi-square test or Fisher's exact test. The continuous variables were reported as the median and the interquartile range; the Mann-Whitney U-test was applied to compare the differences between the groups. The multivariate logistic regression was performed to figure out the independent risk factors associated with COVID-19 ARDS. A nomogram plot was further established based on the result of the multivariate logistic regression. A two-tailed P-value of <0.05 was considered significant. The data analysis in this study was completed via Python version 3.8 and R version 4.0.5.

The COVID-ARDS prediction based on clinical features
Four different machine learning algorithms were implemented to establish the predictive models for COVID-19 ARDS, including logistic regression (LR), support vector machine (SVM), random forest (RF), and extreme gradient boosting (XGBoost). The training cohort was divided into five partitions, of which four-fifths were used to train the models, and the remaining part was used to validate the models. The hyperparameters of all the models were fine-tuned for the highest area under the receiver operating to avoid the problem of overfitting. We followed two specific rules when searching for the best hyperparameters, which were as follows: (1) the training loss was the lowest after the test of all combinations  of hyperparameters; (2) the log loss in the validation cohort was less than -log 0.5 and higher than the training cohort. Grid search with 5-fold cross-validation was applied to search for the most appropriate hyperparameters in the training cohort. Finally, the predictive performance of established models was compared in the validation cohort.

The labeling of individual CT slices
We first manually labeled 897 slices of 30 patients to train the classification model for individual CT slices. The CT slices were classified into two types: (1) normal CT, in which the image features in lungs were consistent with healthy lungs; (2) abnormal CT, in which image features were associated with COVID-19 pneumonia. Two senior ICU clinicians (ZHand YG) independently labeled individual CT slices. Any disagreements were resolved through discussion. The deep learning framework was based on the architecture of VGG-16, which consisted of 13 convolutional layers and 3 fully connected layers. We further internally validated the classification model and used it to label the remaining 2, 300 CT slices. Finally, every CT slice was classified into a normal CT image or an abnormal CT image.

The COVID-ARDS prediction based on CT images
After the auto-labeling of individual CT slices, we assumed that an abnormal CT slice classified by the model was a positive case. Then, the possibility of being an abnormal CT slice for every CT image was calculated. The 10 most probable abnormal CT slices of a single patient were viewed as the representative CT images and were input into the second VGG-16 network. This convolutional neural network (CNN) allows for the shift from the prediction of COVID-19 ARDS based on individual CT slices to the prediction based on a single patient. The VGG-16 network consists of 1 input layer, 13 convolutional layers, 3 fully connected layers, and 1 output layer. The convolutional layers were used to handle feature extraction and presentation.
The pooling layers were used for filtering abundant information under the max-pooling strategy. In the last three output layers, the possibility of being a positive case was calculated for each CT slice. For the individual CT-based prediction, the possibility ranged from 0 to 1, representing a CT slice classified into a normal CT image or an abnormal CT slice. For the single patient-based prediction, the possibility ranged from 0 to 1, representing a patient being predicted to develop COVID-19 ARDS or not.

The integration of predictions models
The integration of two prediction models based on CT images and clinical data was achieved by the penalized logistic regression algorithm. The L2 regularization of the penalized logistic regression algorithm was used. To be specific, the machine learning model based on clinical features and the CNN model based on CT images individually generated two scores for the prediction of COVID-19 ARDS, which were taken as input features for the penalized logistic regression algorithm. At last, the penalized logistic regression algorithm calculated a prediction score for the COVID-19 ARDS outcome.

The evaluation of model performance
We randomly divided the patients into the training cohort and the validation cohort in a ratio of 7:3. The overall predictive performance of the integrated model was measured in the test cohort. The receiver operating characteristics (ROC) curve and the confusion matrices of all established predictive models were depicted to compare the performance of the predictive models. A ROC curve is a graphic plot used to illustrate a binary classifier's diagnostic ability as the discrimination threshold varies. It is created by plotting the true-positive rate against the false-positive rate at different discrimination thresholds. The calibration plots were also depicted to assess the predictive performance of all the models.

Baseline clinical features of included patients
In total, 103 patients were enrolled in the study after the screening for eligibility, of whom 23 patients (22.3%) developed COVID-19 ARDS. The flowchart of the patients' selection is provided in Figure 1. The baseline clinical features of the included patients are presented in Table 1. There were no missing data in our study.

A summary of collected CT images
Original chest CT images containing fields of the lung parenchyma were obtained from 103 patients. The total number of included CT images was 3,187, of which 690 CT slices were from COVID-19 ARDS patients and 2,497 CT slices were from non-COVID-19 ARDS patients. We manually classified 897 CT slices from 30 patients into normal CT images or abnormal CT images.

FIGURE
The nomogram plot for the prediction of COVID-ARDS.
plot was illustrated based on the result of the multivariate logistic regression model (Figure 2). We could calculate the risk score and the corresponding possibility of COVID-19 ARDS using the nomogram.

The predictive performance of models based on clinical features
We developed four machine learning models to predict COVID-19 ARDS, including logistic regression, support vector machine, random forest, and extreme gradient boosting. The ROC curves of all the machine learning models are shown in Figure 3A. The area under the ROC curve of the XGBoost model was 0.94, which outperformed the logistic regression model (AUC = 0.82), the support vector machine model (AUC = 0.77), and the random forest model (AUC = 0.92). We also performed the Delong test to compare the AUCs of the XGBoost model against the other three models (XGBoost model vs. logistic regression model, P<0.001; XGBoost model vs. support vector machine model, P < 0.001; and XGBoost vs. random forest model, P = 0.002). The calibration curves are provided in Figure 3B. The XGBoost model was finally chosen to be the best machine learning model to predict COVID-19 ARDS in our study.

The predictive performance of the CNN model based on CT images
In total, 897 manually labeled CT slices were used to train the classification CNN model based on individual CT images. Figure 4A shows the ROC curve of the classification CNN model (AUC = 0.99). The confusion matrix of the classification CNN model is shown in Figure 4B. The normal CT slices and the abnormal CT slices were correctly distinguished by the classification CNN model. The calibration curve plot indicated a good agreement between the predicted probabilities of COVID-19 ARDS calculated by the predictive models and the actual outcome ( Figure 5B). The confusion matrices were plotted using clinical features, CT images, and integrated data to predict COVID-19 ARDS (Figure 6). We found that the integrated deep learning model could yield more accurate predictions than the individual model based on clinical features or CT images. More details about the predictive performance of the models are provided in Table 3.

Discussion
The outbreak of COVID-19 led to a global pandemic, and the main causes of the deaths were pulmonary complications such as acute respiratory distress syndrome. A comprehensive analysis of . /fmed. .  the clinical symptoms, laboratory test results, and CT images is crucial to help understand the scope of COVID-19. We believe that an ensemble predictive model based on the integrated data from the patients could provide more information about the risk factors of complications such as ARDS brought on by COVID-19. Moreover, detailed and accurate risk evaluation of COVID-19 ARDS is important for clinicians to provide more personalized treatment to patients. Some published studies have applied advanced artificial intelligence methods to predict the prognosis of COVID-19 (17)(18)(19)(20). They demonstrated the value of machine learning algorithms for predicting the outcomes of COVID-19, but no radiology information was included in the studies (21,22). Lee (25). In this study, the infection fields of the lung were segmented for the quantitative analysis of the volume and density. We thought the quantitative analysis of CT images could not make the most of the CT information and thus may yield less accurate predictions.
In this retrospective study, we developed three models for the prediction of COVID-19 ARDS. Two individual models were established based on the clinical features data and the CT images, respectively; the third deep learning model was integrated by .
/fmed. .   the two individual models. We found that the integrated deep learning model could offer better discriminatory performance for predicting COVID-19 ARDS than the two individual models. To strengthen the understanding of COVID-19 ARDS, we performed the multivariate logistic regression to find out the independent risk factors associated with COVID-19 ARDS and depicted the nomogram plot for it. We found that age, the concentration of c-reactive protein, PaO2/FiO2 ratio, the count of total T lymphocytes, and the level of IL-6 were related to COVID-19 ARDS. The inevitable deterioration in immunity response in senior citizens may be the reason for advanced age being a risk factor for COVID-19 ARDS (26). COVID-19 is manifested as a multisystemic disease, and the hyperinflammatory response is extremely associated with its outcome (27). COVID-19 ARDS also causes typical lung pathological changes, which are accompanied by acute and chronic inflammation (28,29). High concentrations of CRP and IL-6 may indicate a pro-inflammatory state, which has been reported as a risk factor for a severe outcome (26,27). It is reported that critically ill COVID-19 patients exhibited a status of immune cell hyporesponsiveness when compared to healthy people (28). Several studies have highlighted the values of T-lymphocyte subset absolute counts in predicting morbidity . /fmed. .
in COVID-19 patients (29)(30)(31). The XGBoost model was selected as the best model to handle the clinical features data because of the best predictive performance tested in the validation cohort. XGBoost stands for "Extreme Gradient Boosting" and was first proposed by Friedman (32). The XGBoost model is one of the ensembling learning algorithms, which makes precise predictions based on a series of weak classifiers, and it has been applied in many studies to deal with massive medical data. The CT scan procedure can provide more information about the severity of lung damage and acute respiratory failure with a much faster turnaround time (2, 33). The distinctive characteristics of CT slices from COVID-19 ARDS patients could be captured by the convolutional neural network. In our study, the predictive performance of the VGG-16 model was better than that of the model based on the clinical features data. VGG architecture was first proposed by the Visual Geometry Group from Oxford and ranges from 11 to 19 layers (34). The VGG models are widely used as image classifiers or the fundamental basis of newly developed models, which also use images as input data. The VGG-16 network was first used to classify the individual CT slices into normal and abnormal images. Furthermore, the individual patient-based prediction of COVID-19 ARDS was also fulfilled by the VGG-16 network. The XGBoost model and the VGG-16 network model are complementary to each other. The predictive performance of the integrated model was superior to the individual ones. The integrated deep learning model we proposed was demonstrated to be reliable in predicting COVID-19 ARDS with high accuracy in our study. The tremendous progress made in the field of artificial intelligence facilitated the analysis of massive medical data. Our deep learning model may be one example of an automatic analysis tool that can be used for various medical data or alarming systems of adverse events in critically ill patients. Once the integrated deep learning model is fused into the information system of the hospitals, it could rapidly and correctly identify patients at high risk of COVID-19 ARDS without redundant operations.
There are some limitations in our study. First, this is a singlecenter retrospective study with a relatively small sample size. Second, the validation of the predictive model was only performed in the internal cohort. It is unclear whether similar predictive performance can be observed in other medical centers when our models are applied.

Conclusion
In our study, we tried to establish different models to predict COVID-19 ARDS. We found that the models based on the clinical features or the CT images could provide accurate predictions of COVID-19 ARDS. Moreover, the integrated model combining the two individual models exhibited the best predictive performance with the highest accuracy and ROC value.

Data availability statement
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

Ethics statement
The studies involving human participants were reviewed and approved by the institutional Ethics Committees at Shanghai Renji Hospital. Written informed consent for participation was not required for this study in accordance with the national legislation and the institutional requirements. Written informed consent was not obtained from the individual(s) for the publication of any potentially identifiable images or data included in this article.

Author contributions
YG and ZH contributed to the conception and design of the study. YZ and SQ organized the database. YZ and JF performed the statistical analysis. SM, SX, and QX wrote the first draft of the manuscript. YZ, JF, ZH, ZZ, and RT wrote sections of the manuscript. All authors contributed to the manuscript revision, read, and approved the submitted version.

Funding
This study was supported by the Shanghai Science and Technology Commission (22YF1423300) and the Renji Hospital Clinical Research Innovation and Cultivation Fund (RJPY-DZX-008).